我们提出了一种有效的可解释的神经象征模型来解决感应逻辑编程(ILP)问题。在该模型中,该模型是由在分层结构中组织的一组元规则构建的,通过学习嵌入来匹配元规则的事实和身体谓词来发明一阶规则。为了实例化它,我们专门设计了一种表现型通用元规则集,并证明了它们产生的喇叭条件的片段。在培训期间,我们注入了控制的\ PW {gumbel}噪声以避免本地最佳,并采用可解释性 - 正则化术语来进一步指导融合到可解释规则。我们在针对几种最先进的方法上证明我们对各种任务(ILP,视觉基因组,强化学习)的模型进行了验证。
translated by 谷歌翻译
虽然深增强学习已成为连续决策问题的有希望的机器学习方法,但对于自动驾驶或医疗应用等高利害域来说仍然不够成熟。在这种情况下,学习的政策需要例如可解释,因此可以在任何部署之前检查它(例如,出于安全性和验证原因)。本调查概述了各种方法,以实现加固学习(RL)的更高可解释性。为此,我们将解释性(作为模型的财产区分开来和解释性(作为HOC操作后的讲话,通过代理的干预),并在RL的背景下讨论它们,并强调前概念。特别是,我们认为可译文的RL可能会拥抱不同的刻面:可解释的投入,可解释(转型/奖励)模型和可解释的决策。根据该计划,我们总结和分析了与可解释的RL相关的最近工作,重点是过去10年来发表的论文。我们还简要讨论了一些相关的研究领域并指向一些潜在的有前途的研究方向。
translated by 谷歌翻译
推理,学习和决策的整合是构建更多普通AI系统的关键。作为朝这个方向的一步,我们提出了一种新颖的神经逻辑架构,可以解决电感逻辑编程(ILP)和深增强学习(RL)问题。我们的体系结构通过分配权重来谓词而不是规则来定义一阶逻辑程序的受限但呈现的连续空间。因此,它是完全可分的,可以用梯度下降有效地培训。此外,在与演员批评算法的深度RL设置中,我们提出了一种新颖的高效评论家建筑。与ILP和RL问题的最先进方法相比,我们的命题实现了出色的性能,同时能够提供完全可解释的解决方案和更好地缩放,特别是在测试阶段。
translated by 谷歌翻译
We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying structure across tasks, utilize it to plan in each task, and upper-bound the regret of the planning loss. Our bound suggests that the average regret over tasks decreases as the number of tasks increases and as the tasks are more similar. In the classical single-task setting, it is known that the planning horizon should depend on the estimated model's accuracy, that is, on the number of samples within task. We generalize this finding to meta-RL and study this dependence of planning horizons on the number of tasks. Based on our theoretical findings, we derive heuristics for selecting slowly increasing discount factors, and we validate its significance empirically.
translated by 谷歌翻译
Underwater images are altered by the physical characteristics of the medium through which light rays pass before reaching the optical sensor. Scattering and strong wavelength-dependent absorption significantly modify the captured colors depending on the distance of observed elements to the image plane. In this paper, we aim to recover the original colors of the scene as if the water had no effect on them. We propose two novel methods that rely on different sets of inputs. The first assumes that pixel intensities in the restored image are normally distributed within each color channel, leading to an alternative optimization of the well-known \textit{Sea-thru} method which acts on single images and their distance maps. We additionally introduce SUCRe, a new method that further exploits the scene's 3D Structure for Underwater Color Restoration. By following points in multiple images and tracking their intensities at different distances to the sensor we constrain the optimization of the image formation model parameters. When compared to similar existing approaches, SUCRe provides clear improvements in a variety of scenarios ranging from natural light to deep-sea environments. The code for both approaches is publicly available at https://github.com/clementinboittiaux/sucre .
translated by 谷歌翻译
We study representation learning for efficient imitation learning over linear systems. In particular, we consider a setting where learning is split into two phases: (a) a pre-training step where a shared $k$-dimensional representation is learned from $H$ source policies, and (b) a target policy fine-tuning step where the learned representation is used to parameterize the policy class. We find that the imitation gap over trajectories generated by the learned target policy is bounded by $\tilde{O}\left( \frac{k n_x}{HN_{\mathrm{shared}}} + \frac{k n_u}{N_{\mathrm{target}}}\right)$, where $n_x > k$ is the state dimension, $n_u$ is the input dimension, $N_{\mathrm{shared}}$ denotes the total amount of data collected for each policy during representation learning, and $N_{\mathrm{target}}$ is the amount of target task data. This result formalizes the intuition that aggregating data across related tasks to learn a representation can significantly improve the sample efficiency of learning a target task. The trends suggested by this bound are corroborated in simulation.
translated by 谷歌翻译
Forecasting the state of vegetation in response to climate and weather events is a major challenge. Its implementation will prove crucial in predicting crop yield, forest damage, or more generally the impact on ecosystems services relevant for socio-economic functioning, which if absent can lead to humanitarian disasters. Vegetation status depends on weather and environmental conditions that modulate complex ecological processes taking place at several timescales. Interactions between vegetation and different environmental drivers express responses at instantaneous but also time-lagged effects, often showing an emerging spatial context at landscape and regional scales. We formulate the land surface forecasting task as a strongly guided video prediction task where the objective is to forecast the vegetation developing at very fine resolution using topography and weather variables to guide the prediction. We use a Convolutional LSTM (ConvLSTM) architecture to address this task and predict changes in the vegetation state in Africa using Sentinel-2 satellite NDVI, having ERA5 weather reanalysis, SMAP satellite measurements, and topography (DEM of SRTMv4.1) as variables to guide the prediction. Ours results highlight how ConvLSTM models can not only forecast the seasonal evolution of NDVI at high resolution, but also the differential impacts of weather anomalies over the baselines. The model is able to predict different vegetation types, even those with very high NDVI variability during target length, which is promising to support anticipatory actions in the context of drought-related disasters.
translated by 谷歌翻译
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.
translated by 谷歌翻译
地震数据中的噪声来自许多来源,并且正在不断发展。使用监督的深度学习程序来降级地震数据集通常会导致性能差:这是由于缺乏无噪声的现场数据来充当训练目标以及合成数据集和现场数据集之间特性的巨大差异。自我监督,盲点网络通常通过直接在原始嘈杂的数据上训练来克服这些限制。但是,这样的网络通常依赖于随机噪声假设,并且在存在最小相关的噪声的情况下,它们的降解能力迅速降低。从盲点延伸到盲面可以有效地沿特定方向抑制连贯的噪声,但不能适应噪声的不断变化的特性。为了抢占网络预测信号并减少其学习噪声属性的机会的能力,我们在以自欺欺人的方式进行微调的方式,在节俭生成的合成数据集上对网络进行初始监督的培训。感兴趣的数据集。考虑到峰值信噪比的变化以及观察到的噪声量减少和信号泄漏的体积,我们说明了从监督的基础训练中的权重来初始化自我监督网络的明显好处。通过在字段数据集上进行的测试进一步支持,在该数据集中进行了微调网络在信号保存和降低噪声之间达到最佳平衡。最后,使用不切实际的,节俭生成的合成数据集用于监督的基础培训包括许多好处:需要最少的先验地质知识,大大降低了数据集生成的计算成本,并减少了重新训练的要求。网络应记录条件更改,仅举几例。
translated by 谷歌翻译
我们研究Claire(一种差异性多形状,多-GPU图像注册算法和软件)的性能 - 在具有数十亿素素的大规模生物医学成像应用中。在这样的分辨率下,大多数用于差异图像注册的软件包非常昂贵。结果,从业人员首先要大量删除原始图像,然后使用现有工具进行注册。我们的主要贡献是对降采样对注册性能的影响的广泛分析。我们通过将用Claire获得的全分辨率注册与合成和现实成像数据集的低分辨率注册进行比较,研究了这种影响。我们的结果表明,完全分辨率的注册可以产生卓越的注册质量 - 但并非总是如此。例如,将合成图像从$ 1024^3 $减少到$ 256^3 $将骰子系数从92%降低到79%。但是,对于嘈杂或低对比度的高分辨率图像,差异不太明显。克莱尔不仅允许我们在几秒钟内注册临床相关大小的图像,而且还可以在合理的时间内以前所未有的分辨率注册图像。考虑的最高分辨率是$ 2816 \ times3016 \ times1162 $的清晰图像。据我们所知,这是有关此类决议中图像注册质量的首次研究。
translated by 谷歌翻译